JOURNAL OF GEOSCIENCE EDUCATION 60, 249-256 (2012) 


Geoscience Data for Educational Use: Recommendations from 
Scientific/Technical and Educational Communities 

Michael R. Taber, 1,a Tamara Shapiro Ledley , * 1 2 Susan Lynds , 3 Ben Domenico , 4 and LuAnn Dahl man 5 


ABSTRACT 

Access to geoscience data has been difficult for many educators. Understanding what educators want in terms of data has 
been equally difficult for scientists. From 2004 to 2009, we conducted annual workshops that brought together scientists, data 
providers, data analysis tool specialists, educators, and curriculum developers to better understand data use, access, and user- 
community needs. All users desired more access to data that provide an opportunity to conduct queries, as well as visual/ 
graphical displays on geoscience data without the barriers presented by specialized data formats or software knowledge. 
Presented here is a framework for examining data access from a workflow perspective, a redefinition of data not as products 
but as learning opportunities, and finally, results from a Data Use Survey collected during six workshops that indicate a 
preference for easy-to-obtain data that allow users to graph, map, and recognize patterns using educationally familiar tools 
(e.g., Excel and Google Earth). © 2012 National Association of Geoscience Teachers. [DOI: 10.5408/12-297.1] 
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INTRODUCTION 

The use of scientific data can best be characterized in the 
context of workflow: from data acquisition and documenta¬ 
tion regarding acquisition quality, to raw storage, to analysis 
resulting in derivatives of the original data, such as model 
output and visualizations (Reichman et al., 2011). Access to 
scientific data at each stage has been difficult for educators. 
Newly acquired data are often proprietary or, if available, only 
open to users well versed in the acronymic vocabulary. Raw 
data are often stored in formats (e.g., hierarchical data format, 
or HDF) that are unusable with the simple data analysis 
software (e.g., Excel) common among educators. Finally, data 
derivatives such as LandSat images provide useful information 
but don't provide much opportunity for data analysis (unless 
the user is well versed in remote sensing). 

Hence, what can we learn about data access and use 
from 7 years of Digital Library for Earth System Education 
data services (DDS) and subsequent AccessData workshops 
(http://serc.carleton.edu/usingdata/accessdata/index.html)? 
Attending the workshops were scientists and data providers 
(scientific/technical community) and educators and curricu¬ 
lum developers (educational community). Participants were 
organized into teams of four to six participants, with each 
area of community expertise represented. We present here a 
three-part discussion. First, we briefly examine the history of 
the data workflow problem (in regards to scientific/technical 
and educational communities). Second, we clarify scientific 
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data in terms of usefulness for the educational community. 
Finally, we present what DDS and AccessData workshop 
participants, representing both the scientific/technical and 
the educational communities, said about data use. 


THE DATA WORKFLOW PROBLEM 

Despite recognition in Science for All Americans that 
science demands evidence derived from data (Rutherford, 
1990), scientific data are perceived by educators as enigmat¬ 
ic. This often leads to frustration in accessing and selecting 
data, because the data type and usability are unclear. Almost 
always, the tasks of parsing the data and making decisions 
about how to conduct proper analyses are up to the end user 
or, in this case, the educator (Ledley et al., 2008). 

As handheld devices (e.g., Vernier probes) became more 
popular in schools during the late 1990s, teachers and 
researchers found learning value in student-collected data 
(Tatar et al., 2003). However, school network infrastructure 
and functionality, including access to outside scientific 
dataset sources, were still a challenge for teachers. Moreover, 
teachers lacked the necessary knowledge to extract, parse, 
and import datasets into data visualization and analysis 
software tools still largely reserved for scientists (e.g., 
Arclnfo). 

Notwithstanding the technical barriers, the education 
community in the late 1990s and early 2000s recognized the 
importance of the use of scientific data in supporting student 
inquiry. Access to raw, derivative, or model data streams 
supports students' knowledge construction about data 
uncertainties and improves students' quantitative skills 
(Manduca and Mogk, 2002; Creilson et al., 2008). Simulta¬ 
neously, curriculum developers looked for data that could be 
easily incorporated into labs or activities without the need 
for middleware, extensive data analysis, or expensive display 
software. Educators agreed that data provide a rich context 
from which inquiry can be learned (Bransford et al., 1999). 

Since the early 1990s, there have been rapid advances in 
climate research and remote-sensed data sources—so much 
so that the National Aeronautics and Space Administration 


1089-9995/2012/60(3)/249/8 


249 


© Nat. Assoc. Geosci. Teachers 





250 Taber et at. 


J. Geosci. Educ. 60 , 249-256 (2012) 


(NASA) and other government agencies began making data 
readily available to both scientists and the public. Initially, 
publicly available data products were images, as exemplified 
by NASA's announcement in 1994 that it intended to make 
space and science data available via the World Wide Web 
(Bell, 1994). The workflow problem for public (educators) 
still existed. In the scientific/technical community, workflow 
concerns about data moved beyond storage to the richness 
and usefulness of data (Baraniuk, 2011). Usefulness still 
meant for scientists. 

In the early 2000s, the DDS project was born. Our goal 
was to engage the educational community in access to and 
analysis of scientific data—essentially, responding to the 
workflow issue of usefulness. The workshop model present¬ 
ed by Ledley et al. (2012, in this issue) provides a good 
structure for the educational community to engage in real- 
world questions and analysis with the scientific/technical 
community. Essentially, the workshop model solves the 
workflow problem for the educational community. 

From 2004 to 2009, we conducted annual workshops 
with teams consisting of a scientist, data provider, tool 
specialist, curriculum developer, and educator. Each team 
was charged with identifying a particular dataset or several 
datasets that would be of interest to the geoscience 
education community. A team initiated this focus by 
completing a Datasheet (Ledley et al., 2008), which 
described a particular scientific dataset with human-read¬ 
able, educationally relevant metadata to facilitate exploration 
of the data by educators and students (http://serc.carleton. 
edu/usingdata/browse_sheets.html). Essentially, the Data- 
Sheet provided a critical opportunity for both the scientific/ 
technical and the educational communities to openly discuss 
and resolve the data workflow problem. 

Datasheets highlight the connections among datasets, 
specific topics in science, and skills students can build by 
using the dataset. Datasheets also identify the analysis tools 
that can be used to access and analyze the data and provide 
examples of resources, when available, of how to acquire, 
interpret, and analyze the data. Information is presented at a 
level appropriate for those who don't have specialized 
knowledge of the discipline in which the data are commonly 
used. Datasheets are designed to support novice or out-of¬ 
field data users by providing them with the knowledge 
necessary to obtain and use data appropriately for scientific 
explorations. Datasheets also provide the meanings for 
acronyms and other jargon that users are likely to encounter 
and include links to journal articles and educational 
resources that cite or use the data. 

Parallel to the identification of the dataset, the workshop 
team determined an appropriate analysis tool would assist 
the team in developing a case study. The case study provided 
the foundation for much of the team's workshop efforts: 
developing a story line that afforded a reason for caring 
about the data and a framework for creating an Earth 
Exploration Toolbook chapter (Ledley et al., 2011), an online 
activity that provides step-by-step instructions for accessing 
and analyzing the data around a scientific concept or issue. 

DEFINING LEVELS OF DATA FOR LEARNING 
OPPORTUNITIES 

In 2004, the NASA Earth Observing System (EOS) 
created four levels of data products, which were both 


spatially and temporally described, and served as the basis 
for data distribution to the broader geoscience community 
(King et al., 2004): 

• Level 0—Raw binary 

• Level 1A—Unprocessed instrument data 

• Level IB—Processed data into sensor units 

• Level 2—Data derived into geophysical variables 

• Level 3—Geophysical variables mapped in uniform 
space and time 

• Level 4—Model output or analysis output that 
includes lower-level data. 

The processed EOS data were almost entirely stored and 
distributed in HDF—a format not useful to the educational 
community. 

If we are to consider the workflow characteristic 
usefulness for educators, then we must rethink the data 
products in terms of learning opportunities (Table I). Thus, 
we consider rethinking level 2 data—the first opportunity for 
data analysis—to be stored in universal formats rather than 
technical formats. In addition, level 2 data should include 
metadata characterized from the scientific workflow. Finally, 
adding level 5 data as easy-to-use or display-image data 
distinguishes the data product from level 4, which may still 
provide data analysis opportunities. 

With the data levels redefined, the first opportunity for 
educational use is with level 1A data. Here, a student 
collecting "temperature" data in the field with a handheld, 
electronic datalogger would really be collecting electronic 
signals (i.e., millivolts). However, the datalogger's firmware 
would interpret the millivolt readings as temperature 
outputs. The student may not be fully aware of the 
firmware, unless the student task is to check the sensor 
operability by analyzing the millivolt output. Scientists and 
data providers might consider publishing level 1A data, 
particularly for students who might be interested in sensor 
analysis. 

Educators could also find level IB data useful, because 
level IB data provide an opportunity for students to check for 
erroneous data outputs. In the era of simpler, easier-to-use 
data visualization tools—such as MyWorldGIS (http://www. 
myworldgis.org/), Environmental Systems Research Insti¬ 
tute's (ESRI's) ArcGIS Explorer (http://www.esri.com), and 
Google Earth (http://www.google.com)—redefining level 2- 
4 data becomes increasingly important. 

Despite inherent ease of use, EOS did not describe a 
level 5 product. However, we define level 5 as static 
derivative imagery, such as Graphics Interchange Format 
(.gif) or Joint Photographic Experts Group (.jpg), used 
primarily in education for presentations. ImageJ (http:// 
rsbweb.nih.gov/ij/index.html) was presented in the 2006 
AccessData workshop as a tool for displaying, editing, 
analyzing, and processing of 8-bit, 16-bit, and 32-bit .gif and 
.jpg images. Using ImageJ in the educational setting 
constitutes use at level 4 with level 5 data formats. 


WHAT WORKSHOP PARTICIPANTS SAID 
ABOUT DATA USE 

Data informing this commentary were collected through 
the use of an anonymously delivered Data Use Survey, 
which consisted of 10 questions (Table II). A total of 237 
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TABLE I: Levels of data defined based on data format and learning opportunities. Adapted from NASA's EOS; levels 0, la, lb, 3, 
and 4 are the original four data levels described by EOS. 


Level 

Characterized by 

Learning Opportunities 

Possible Education 
Obstacles 

Example End-User 
Analysis 

Level 0 

Raw binary format; sensory 
input 

Where data comes from 

Often requires engineering 
knowledge; needs an expert 
observer 


Level 1A 

Reconstructed, unprocessed 
instrument data 

Interpretation and 
uncertainty 

Requires software-specific 
programming knowledge 

Analyze sensor output for 
functionality 

Level IB 

Data reprocessed into 
sensor units but perhaps 
with both known and 
unknown errors 

Spatial and temporal data 
discovery skills; improved 
certainty in data, 
particularly with analysis of 
error 

Teacher time constraints 
related to data extraction; 
students' lack basic of 
statistical knowledge for 
examining errors 

Analyze for sensor error that 
may lead to erroneous 
results 

Level 2 

Data in a basic, universally 
acceptable format (e.g., .txt) 

Importing data for analysis; 
opportunity for error 
analysis if complemented by 
metadata (data about data) 

Time constraints for 
importing data into analysis 
software 

Spreadsheet applications: 
graphing; statistics 

Level 3 

Variables mapped with 
known spatial scales 
(latitude and longitude or 
grid) 

Different ways to visually 
display data in response to 
new scientific questions; 
user control of data; 
distributed data access 

Metadata, if missing; 
potentially expensive 
analysis and/or display 
software 

GIS mapping; queries 

Level 4 

Visual (usually) display of 
data analysis or modeling 
output (e.g., line graph or 
map) 

Quick access to data/ 
information; visually 
stimulating; good gateway 
for sophisticated data 
analysis on data from all 
levels 

Metadata, if missing; 
potentially expensive 
analysis and/or display 
software 

Visualization or image (pixel) 
analysis; modeling; 
classification 

Level 5 

Easy-to-use/universal 
display-image data 

Presentation of information 

Cannot be manipulated in 
pursuit of new questions 

Pattern recognition on static 
images 


participants responded to the Data Use Survey (2005-2009). 
In the survey, participants self-identified with one or more 
workshop roles: scientist, data provider, educator, tool 
specialist, or educator. However, for the purpose of this 
analysis, we only examined the participant's primary self¬ 


identification. Of the 237 total respondents, 56% primarily 
identified with the scientific/technical community and 46% 
identified with the educational community. 

The Data Use Survey provided insight into the 
participants' current experience of using geoscience data in 


TABLE II: Questions asked on the data use survey (2005-2009). Questions in 2004 were revised to inform the survey. 


Number 

Type 

Question 

1 

Multiple response 

What is your primary role at the Data Services workshop? (Please mark your primary role with a 
"1" and check any others that apply.) 

2 

Multiple response 

For which learning goals have you successfully used data within educational contexts? (Check all 
that apply.) 

3 

Multiple response 

Which of the following data have you used successfully? (Check all that apply.) 

4 

Multiple response 

Which of the following data formats have you used successfully? (Check all that apply.) 

5 

Multiple response 

Which of the following data sources have you used more than once? (Check all that apply.) 

6 

Single response 

Have you found it necessary to modify datasets before they were used by an end-user/learner 
(e.g., selected subset, imported into Excel)? 

7 

Multiple response 

What data analysis/visualization tools do you commonly use? 

8 

Multiple response 

What data analysis procedures have your end-users/learners performed on the data? (Check all 
that apply.) 

9 

Rank 

Have you made any attempts to obtain and use datasets that were NOT successful? If yes, what 
barriers did you encounter? (Please rank 1, 2, and 3 in order of priority.) 

10 

Multiple response 

What types of instruction or support are most helpful to you when using specific datasets? (Check 
all that apply.) 
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Successful Use of Data within Educational Contexts bv Purpose 

16.0% 
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FIGURE 1: Successful use of data within educational contexts, as identified by purpose, derived from Data Use 
Survey question 2, "For which learning goals have you successfully used data within educational contexts? (Check all 
that apply.)" Understanding Climate and Topics in Environmental Science were added in 2007. 


achieving educational goals. Open-ended questions were 
categorized and coded for dominant themes. Surveys were 
distributed to participants by an external evaluator in 
scheduled sessions, and time was allotted for participants 
to complete the surveys before leaving the session. This 
methodology is helpful in maximizing response (>90%). 

The first workshop (2004) provided opportunities to 
refine the Data Use Survey questions. The 2004 survey asked 
open-ended questions that led to the development of the 
pointed survey questions for subsequent workshops. Each 
workshop provided a unique population of attendees giving 
unique responses. Most questions resulted in multiple 
response data and were categorized using dichotomies. 
The data were then analyzed using a simple frequency 
procedure for all variables that constituted the set of possible 
answers. The frequency procedure used in the analysis 
produced both counts and percentages for all variables that 
made up the multiple response set. The advantage here is 
that the reported percentages are based on the total number 
of responses for each participant role (i.e., the educational 
community or the scientific/technical community). 


Successful Use of Data in Educational Contexts 

When asked about the purpose for which a participant 
successfully used data within educational contexts, the 
scientific/technical community identified interpreting satellite 
imagery (14.1%) and personal exploration and learning (13.2%) 
as their top two choices (Fig. 1). The educational community 
identified personal exploration (12.1%) and understanding 
weather (11.8%) as their top two choices. Not surprisingly, 
weather maps and satellite imagery are visual and offer both 
geoscientists and educators an opportunity to quickly view and 
analyze level 4 data, level 5 data, or both (Table I). In addition, 
11.5% of the educational community is interested in using data 
to meet science education standards and for pattern recognition (a 
common task for performing inquiry in the classroom), 
whereas only 6% of the scientific/technical community is 
interested in using data for meeting science standards. 

The educational and scientific/technical communities 
both value graphing, data visualization, and mapping as end- 
user analysis methods (Fig. 2). This complements the desire 
for science educators to have students experience doing 
science (Manduca and Mogk, 2002). Graphing of level 1-3 
data provides unique opportunities to understand uncer- 
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End user Analysis by Community 
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FIGURE 2: Preferred end-user analysis methods as differentiated by community. Data derived from Data Use Survey 
question 8, "What data analysis procedures have your end-users/learners performed on the data? (Check all that 
apply.)" 


tainty and the importance of manipulating data (Metz, 2004). 
The visualization and mapping of level 2 data, with 
geographic coordinates, and the manipulation of level 3 
data provide learners with the opportunity to learn 
symbolism, associated with developing the necessary cog¬ 
nitive spatial and temporal skills to conduct queries on 
multiple datasets (Downs et al., 1988; Kastens et al., 2009; 
Montello, 2009). Further analysis on the end-user preference 
of data analysis methods did not reveal any significant trends 
from 2005 to 2009. 

Workshop attendees were asked to indicate preferred 
geoscience data formats (.txt, NetCDF, .jpg, etc.) for 
successful educational use of data. Initially, in 2005, users 
from both the educational and the scientific/technical 
communities preferred level 4 or 5 image data (Table I), 


because the data were readily available and easy to use (e.g., 
in presentations) without the need for sophisticated, 
specialized application-server interfacing software or mid¬ 
dleware (Fig. 3). However, by 2007, all users indicated a 
preference for visualization-based level 3 data (i.e., geo¬ 
graphic information system, or GIS). This more than likely 
coincides with the emergence of virtual "globes," such as 
NASA's WorldWind, ESRI's ArcGIS Voyager, Pasco Scien¬ 
tific's MyWorld GIS, and Google Earth, as educator-friendly 
tools for teaching about the Earth (Kerski, 2008). 

NASA and the United States Geological Survey (USGS) 
were the predominant choices for data sources of both user 
communities (Fig. 4). This is most likely due to both 
government agencies offering all levels of data for the 
public. Moreover, NASA and the USGS, along with the 
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FIGURE 3: Change in data preference by level from 2005 
to 2009 workshops. Data by all users are presented. The 
increase in 2007 for level 3 data preference was 
dominated by an increase in preference by the educa¬ 
tional community. NetCDF, or network common data 
form, is a collection of data libraries commonly used by 
atmospheric scientists. HDF-EOS is a multiobject file 
format commonly used in NASA's EOS. ASCII-Text is a 
common form of data usually presented as .csv or 
comma-separated values. GIS (.shp) is a spatial data file 
commonly used in ESRI products. GeoTiff is a metadata 
file embedded in a Tagged Image File Format image. A 
KMZ file is a compressed Keyhole Markup Language 
file used by Google Earth. The .jpg and .gif files are 
commonly used in digital imagery. 


Environmental Protection Agency (EPA) and the National 
Oceanic and Atmospheric Administration (NOAA), have 
geoscience data collection as an important part of their 
mission. Interestingly, the EPA and the Global Learning and 
Observations to Benefit the Environment (GLOBE) program 
were preferred more by the educational community than by 
the scientific/technical community. In the case of GLOBE, 
data are largely collected and used by the educational 
community. 


RECOMMENDATIONS 

Continued Need for Overcoming Barriers to Accessing 
Data (Educational Workflow) 

During the first three workshops (2004-2006), scientific/ 
technical and educational communities both identified five 
primary barriers to data access: 

1. Users not being able to locate desired data 

2. Unusable formats or unknown file extensions asso¬ 
ciated with the data 

3. Poor documentation (metadata) 

4. Datasets too large for parsing 

5. Users not having required software for processing or 
analyzing datasets. 


By 2006, workshop participants indicated a reduction in 
some barriers. In particular, participants indicated major 
improvements in the first two barriers: locating data and 
finding data in usable formats. Participants also indicated 
slight improvements in locating metadata associated with 
datasets. However, datasets are still either too large or too 
difficult to parse for meeting the specific learning needs of 
educators. In addition, developing sophisticated analysis 
(e.g., knowledge about building Google Earth's .kmz files) is 
beyond the knowledge of most classroom teachers. 

The AccessData team conducted a follow-up workshop 
in February 2010, where previous workshop participants 
were invited to a special Impacts workshop. The participants 
were placed in three groups of 10 people and asked to reflect 
on open-ended questions, such as "Has participation in the 
workshop (s) impacted the way you have used/prepared 
geoscience data or tools in/for education?" 

Self-identified scientists in the Impacts workshop 
articulated the importance of having an educator as a 
partner on the scientific research team. The educator 
provides expertise in helping the scientist determine how 
to make the research data easier for educators to use. 
Educators in the Impacts workshop indicated that involving 
information technology specialists at the institutional level 
was key to overcoming issues with data access. Thus, 
involving educators in scientific projects and technology 
specialists in educational activities are two significant 
recommendations (Lynds and Buhr, 2009, 2010a, 2010b). 

One final recommendation is related to the support 
necessary for successful use of data for education activities. 
Both user communities suggested three critical components 
for improving data access and use (listed by priority): 

1. Providing real-world examples, in the context of 
scientific questions, for users accessing data 

2. Developing step-by-step instructions to fully under¬ 
stand how to conduct analysis on the data 

3. Providing online (video) tutorials, coupled with 
metadata. 

Data can now be delivered using a client server 
approach, where the end user no longer has to mine the 
Web for data sources that match formats required by 
analysis software. The scientific/technical community needs 
to portray quality, integrity, and relevance of data. This 
means presenting data in context (e.g., a visual image of an 
ocean ridge providing a link to geophysical data, such as a 
USGS earthquake map). The scientific/technical community 
also needs to provide detailed metadata with its datasets. 
The metadata should provide sufficient detail so that 
required insider information to access data (e.g., the name 
of the ship or cruise number if the user is looking to plot 
ocean salinity data) is either not necessary or easily indexed. 

Importantly, scientists should engage educators at the 
beginning of research projects so that the project team can 
make informed decisions about how and in what context 
data will be accessed, processed, and analyzed. Clearly, the 
educational community has a strong desire for data that offer 
opportunities for data analysis by students. 

Educators need to be aware of the potential for 
metacognitive learning that data levels 2-4 present. Educa¬ 
tors know knowledge building happens when the learner 
can conduct meaningful analysis. As the scientific/technical 
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FIGURE 4: Data source use designated by scientific/technical and educational communities. Data derived from Data 
Use Survey question 5, "Which of the following data sources have you used more than once? (Check all that apply.)" 
*NOAA data sources include the National Climatic Data Center, National Weather Service, National Geophysical 
Data Center, and National Oceanic Data Center. NOAA received a single count per survey if one or more of the 
preceding sources were selected. 


community makes level 2 and 3 data access and use easier, 
the educational community needs to revisit the curriculum, 
adding the extra time necessary for students to process level 
2 and 3 data. Tools such as spreadsheet applications and 
spatial analysis are becoming increasingly important for 
educators to embrace and integrate into their teaching. As 
the scientific/technical community moves to engage the 
educational community more significantly in research 
projects, the end result will be a richer, scientifically based, 
user-friendly set of data available for the classroom. 
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